***Consider a computer system with a first-level data cache with the following characteristics: size: 16KBytes; associativity: direct-mapped; line size: 64Bytes; addressing: physical.***

***The system has a separate instruction cache and you can ignore instruction misses in this problem. This system is used to run the following code:***

***for (i=0; i<4096; i++)***

***X[i] = X[i] \* Y[i] + C***

***Assume that both X and Y have 4096 elements, each consisting of 4 bytes (single precision floating point). These arrays are allocated consecutively in physical memory. The assembly code generated by a naive compiler is the following:***

|  |  |  |
| --- | --- | --- |
| ***loop:*** | ***lw f2, 0(r1)***  ***lw f4, 0(r2)***  ***multd f2, f2, f4***  ***addd f2, f2, f0***  ***sw 0(r1), f2***  ***addi r1, r1, 4***  ***addi r2, r2, 4***  ***addi r3, r3, 1***  ***bne r3, 4096, loop*** | ***# load X[i]***  ***# load Y[i]***  ***# perform the multiplication***  ***# add C (in f0)***  ***# store the new value of X[i]***  ***# update address of X***  ***# update address of Y***  ***# increment loop counter***  ***# branch back if not done*** |

***a. How many data cache misses will this code generate? Breakdown your answer into the***

***three types of misses. What is the data cache miss rate?***

coldmisses : 2 x (4096/ 16) = 512

conflictmisses : 4096 + (15 16) 4096 = 4096+3840 = 7936

total : 512 + 7936 = 8448 misses

missrate : 8448/12288 = 06875 => 68.75%

thusthere are 3 x 4096 = 12288 memory references

***b. Provide a software solution that significantly reduces the number of data cache misses.***

***How many data cache misses will your code generate? Breakdown the cache misses into the***

***three types of misses. What is the data cache miss rate?***

coldmisses : 4096/ 8 = 512

conflictmisses : 0

total : 512 misses

missrate : 512/ 12288 = 00417 => 4.17%

***c. Provide a hardware solution that significantly reduces the number of data cache misses.***

***You are free to alter the cache organization and/or the processor. How many data cache***

***misses will your code generate? Breakdown the cache misses into the three types of misses.***

***What is the data cache miss rate?***

make the cache set associative

coldmisses : 2 (4096/ 16) = 512

every 16th iteration will miss on the load to X and on the load to Y

conflictmisses : 0

total : 512 misses

missrate : 512/ 12288 = 00417 (4.17%)